PKG: Exclude data test files. #19535

TomAugspurger · 2018-02-04T20:17:31Z

$ ls -lh dist
total 86552
-rw-r--r--  1 taugspurger  staff    10M Feb  4 14:12 pandas-0.23.0.dev0+218.g3f3b4e0bc-cp36-cp36m-macosx_10_12_x86_64.whl
-rw-r--r--  1 taugspurger  staff    12M Feb  4 14:12 pandas-0.23.0.dev0+218.g3f3b4e0bc.tar.gz
-rw-r--r--  1 taugspurger  staff   7.8M Feb  4 12:18 pandas-0.23.0.dev0+219.g4d77cd8e6-cp36-cp36m-macosx_10_12_x86_64.whl
-rw-r--r--  1 taugspurger  staff   9.5M Feb  4 14:11 pandas-0.23.0.dev0+219.g4d77cd8e6.tar.gz

Source: 12M -> 9.5M
Binary: 10M -> 7.8M

Need to do a bit more testing to make sure I didn't break anything.

And need to think about how to test this going forward to ensure we don't write tests that aren't skipped if a data file isn't present.

Closes #19320
Closes #21436

TomAugspurger · 2018-02-04T20:18:20Z

pandas/tests/io/test_html.py

@@ -65,9 +65,6 @@ def _skip_if_none_of(module_names):
                pytest.skip("Bad version of bs4: 4.2.0")


-DATA_PATH = tm.get_data_path()


This types of changes were because pytest.skip can't be called outside of test methods.

TomAugspurger · 2018-02-04T20:19:10Z

setup.py

@@ -722,11 +722,7 @@ def pxd(name):
      maintainer=AUTHOR,
      version=versioneer.get_version(),
      packages=find_packages(include=['pandas', 'pandas.*']),
-      package_data={'': ['data/*', 'templates/*'],


Anyone know what the data/* files were referencing? I think it was supposed to be pandas/tests/data?

IIRC this looks for a data and template sub directory in any of the packages. It doesn’t refer to just one directory.

FWIW I think if those sub directories had an init they would automatically be included per the line above this, but given that’s not the case this helps include those folders relative to any of the packages that are found

jorisvandenbossche · 2018-02-04T21:17:44Z

And need to think about how to test this going forward to ensure we don't write tests that aren't skipped if a data file isn't present.

And likewise we should also make sure to by accident not write tests that are skipped due to an error in the path? (maybe a grep for "Data files not included in pandas distribution." in the travis log could help for that?)

TomAugspurger · 2018-02-04T21:51:13Z

Yes, that’s a good idea. Initially I wrote a decorator to explicitly mark tests using too much data to avoid that issue, but there were too many.

…

________________________________ From: Joris Van den Bossche <[email protected]> Sent: Sunday, February 4, 2018 3:17:47 PM To: pandas-dev/pandas Cc: Tom Augspurger; Author Subject: Re: [pandas-dev/pandas] [WIP]PKG: Exclude data test files. (#19535) And need to think about how to test this going forward to ensure we don't write tests that aren't skipped if a data file isn't present. And likewise we should also make sure to by accident not write tests that are skipped due to an error in the path? (maybe a grep for "Data files not included in pandas distribution." in the travis log could help for that?) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#19535 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIqG83ONnJGZU-V_uZ-96YYvyWKqKks5tRh57gaJpZM4R4tLh>.

jreback · 2018-02-08T01:13:33Z

pandas/tests/io/conftest.py



 @pytest.fixture(scope='module')
 def jsonl_file():
    """Path a JSONL dataset"""
-    return os.path.join(HERE, 'parser', 'data', 'items.jsonl')
+    path = os.path.join(HERE, 'parser', 'data', 'items.jsonl')


should just make this a function (or maybe a decorator)

codecov · 2018-02-25T20:30:55Z

Codecov Report

❗ No coverage uploaded for pull request base (master@95427d5). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #19535   +/-   ##
=========================================
  Coverage          ?    91.9%           
=========================================
  Files             ?      153           
  Lines             ?    49544           
  Branches          ?        0           
=========================================
  Hits              ?    45532           
  Misses            ?     4012           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.3% <ø> (?)`
#single	`41.77% <ø> (?)`

Impacted Files	Coverage Δ
pandas/util/testing.py	`84.98% <ø> (ø)`
pandas/util/_test_decorators.py	`92.5% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95427d5...dbe0c57. Read the comment docs.

pep8speaks · 2018-02-25T20:30:59Z

Hello @TomAugspurger! Thanks for updating the PR.

In the file test_foo.py, following are the PEP8 issues :

Line 3:15: W291 trailing whitespace
Line 9:20: W291 trailing whitespace

Comment last updated on June 21, 2018 at 14:14 Hours UTC

jreback · 2018-02-25T21:07:30Z

ci/script_single.sh

@@ -25,12 +25,12 @@ if [ "$DOC" ]; then
    echo "We are not running pytest as this is a doc-build"

 elif [ "$COVERAGE" ]; then
-    echo pytest -s -m "single" --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas
-    pytest -s -m "single" --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas
+    echo pytest -s -m "single" -r xXs --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas


maybe should make a variable that holds all of these options for both the echo and the run to avoid duplication

jreback · 2018-02-25T21:08:07Z

pandas/tests/conftest.py

+            if request.config.getoption("--strict-data-files"):
+                raise ValueError("Failed.")
+            else:
+                pytest.skip("Data files not included in pandas distribution.")


maybe add the path name in the message

jreback · 2018-02-25T21:08:58Z

pandas/tests/io/test_common.py

@@ -170,6 +170,8 @@ def test_read_non_existant(self, reader, module, error_class, fn_ext):
    ])
    def test_read_fspath_all(self, reader, module, path):


I think you can generally just autouse the datapath fixture

jorisvandenbossche · 2018-06-20T15:55:31Z

doc/source/whatsnew/v0.23.2.txt

+Build Changes
+-------------
+
+- The source and binary distributions no longer include test files, resulting in smaller download sizes. Tests relying on these files will be skipped when using ``pandas.test()``. (:issue:`19320`)


test files -> "test data files" or "data files for testing"

TomAugspurger · 2018-06-20T15:58:50Z

That slightly changes the meaning :)

…

On Wed, Jun 20, 2018 at 10:55 AM, Joris Van den Bossche < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doc/source/whatsnew/v0.23.2.txt <#19535 (comment)>: > @@ -37,6 +37,11 @@ Documentation Changes - - +Build Changes +------------- + +- The source and binary distributions no longer include test files, resulting in smaller download sizes. Tests relying on these files will be skipped when using ``pandas.test()``. (:issue:`19320`) test files -> "test data files" or "data files for testing" — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19535 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHInYwadTa-r5CTvUgr2OirWJuCimaks5t-nB8gaJpZM4R4tLh> .

[ci skip]

jorisvandenbossche · 2018-06-20T16:02:02Z

That slightly changes the meaning :)

But in the correct sense?

From reading the current whatsnew, I understood that you removed all test .py files, so could not run any test (which of course contradicts with the rest of the sentence, but still ..)

TomAugspurger · 2018-06-20T16:03:23Z

Right, the old way sounding like everything was being removed. I'll push an update once appveyor finishes.

…

On Wed, Jun 20, 2018 at 11:02 AM, Joris Van den Bossche < ***@***.***> wrote: That slightly changes the meaning :) But in the correct sense? From reading the current whatsnew, I understood that you removed all test .py files, so could not run any test (which of course contradicts with the rest of the sentence, but still ..) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#19535 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIhVtPjWliqcKG-rbYGqxFOY-kG8Lks5t-nICgaJpZM4R4tLh> .

jreback · 2018-06-21T00:28:25Z

pandas/tests/io/json/test_pandas.py

@@ -59,7 +60,8 @@ def setup_method(self, method):
        self.mixed_frame = _mixed_frame.copy()
        self.categorical = _cat_frame.copy()

-    def teardown_method(self, method):
+        yield


yeah we should really not use this pattern, rather changing to all fixtures. as a temporary workaround this ok, can you create an issue to 'fix' this properly though.

jreback · 2018-06-21T00:29:34Z

pandas/tests/io/test_html.py

@@ -25,8 +25,7 @@
 import pandas.util._test_decorators as td
 from pandas.util.testing import makeCustomDataframe as mkdf, network

-
-DATA_PATH = tm.get_data_path()
+HERE = os.path.dirname(__file__)


do we still need this?

jreback · 2018-06-21T00:30:14Z

pandas/tests/io/test_packers.py

+        # GH12142 0.17 files packed in P2 can't be read in P3
+        if (compat.PY3 and version.startswith('0.17.') and
+                legacy_packer.split('.')[-4][-1] == '2'):
+            pytest.skip("Files packed in Py2 can't be read in Py3.")


can you add the filename here

jreback · 2018-06-21T00:32:01Z

doc/source/whatsnew/v0.23.2.txt

@@ -37,6 +37,11 @@ Documentation Changes
 -
 -



can you add a ref here as well

General note: IMO it is not needed to always ask this of contributors, as this ref is only needed when we actually want to make an explicit link to it from within the rst files (and chances are quite high we will never do this). The ref can always be added at the moment one adds a link.

sure, but in general its a good practice

TomAugspurger · 2018-06-21T14:19:42Z

pandas/tests/io/test_html.py


-DATA_PATH = tm.get_data_path()
+
+def pytest_generate_tests(metafunc):


What do people thing about this approach?

This seems to be the recommended why to dynamically generate parametrized fixtures, based on some condition at runtime (the files present in this folder).

Actually, in this case I'm not sure it makes sense...

When the files aren't present, paths will be empty.

Actually, this probably is what we want. When the files are present, we'll get one fixture per file, which is exactly what we want. When they aren't present, nothing is collected (no fixtures), which is fine.

But will it then still fail in the strict case?

The files won't be there, so no fixtures are generated. It won't "exist", so there's nothing to fail :)

I think that's impossible for dynamically generated fixtures like this (which may be your point).

The alternative is to explicitly list them, which isn't so difficult, as there are only 8. Then we'll have a fixture like html_files. going forward, if we have to add an additional HTML file, we would add it there. I'd prefer being explicit in this case, since there are so few files.

And the advantage of explicitly listing them is also that we don't need to use this dark pytest magic :-)

Just the normal pytest magic :)

Late here but I don’t mind this approach (not that different from metaclasses in Python). Is it not theoretically possible though to just fail the generated tests if the strict parameter is supplied yet no files are found?

In this case, I'd prefer to explicitly list them since there are so few.

TomAugspurger · 2018-06-21T14:20:07Z

pandas/tests/io/test_html.py

-        os.path.join(DATA_PATH, 'html_encoding', '*.html')))
-    def test_encode(self, f):
-        _, encoding = os.path.splitext(os.path.basename(f))[0].split('_')
+    def test_encode(self, html_file):


this is where html_file is used.

jreback · 2018-06-26T12:14:19Z

test_foo.py

@@ -0,0 +1,22 @@
+import pytest


@TomAugspurger ?

jreback · 2018-06-26T12:27:26Z

lgtm otherwise. merge when ready.

TomAugspurger · 2018-06-26T15:02:22Z

Fixed a merge conflict.

(cherry picked from commit 36422a8)

PKG: Exclude data test files.

4d77cd8

TomAugspurger added Testing pandas testing functions or related to the test suite Build Library building on various platforms labels Feb 4, 2018

TomAugspurger commented Feb 4, 2018

View reviewed changes

Stuff

270e442

jreback requested changes Feb 8, 2018

View reviewed changes

TomAugspurger mentioned this pull request Feb 13, 2018

Pandas install includes hefty 29MB tests/ directory #19681

Closed

TomAugspurger added 3 commits February 22, 2018 15:47

Merge remote-tracking branch 'upstream/master' into package-size

26e9b4b

Refactor data path handling

1804bcc

More fixtures

7022152

jreback requested changes Feb 25, 2018

View reviewed changes

TomAugspurger added 10 commits March 26, 2018 15:20

Merge remote-tracking branch 'upstream/master' into package-size

080f000

Updated html

151ffda

Remove os.path.joins

d9d6570

More modules

5849591

Some more

31fb0b6

Merge remote-tracking branch 'upstream/master' into package-size

9193f15

Updated packers

e897f11

Pickle

9cf30fd

Linting

95cde7a

Autouse stata

10ddddc

TomAugspurger changed the title ~~[WIP]PKG: Exclude data test files.~~ PKG: Exclude data test files. Mar 27, 2018

TomAugspurger added 3 commits March 27, 2018 09:10

Remove filename

e1ea208

Autouse in merge_asof

8616878

Cleanup plotting

77bf77c

TomAugspurger added 2 commits June 20, 2018 08:14

Fixed windows

7fd7660

whatsnew

c187f8b

TomAugspurger mentioned this pull request Jun 20, 2018

Compile failed while installing pandas with pip #11265

Closed

jorisvandenbossche reviewed Jun 20, 2018

View reviewed changes

Clarify note [ci skip]

632a61d

[ci skip]

jreback requested changes Jun 21, 2018

View reviewed changes

jreback added this to the 0.23.2 milestone Jun 21, 2018

jreback requested changes Jun 21, 2018

View reviewed changes

TST: refactored html tests

b5b70c7

TomAugspurger mentioned this pull request Jun 21, 2018

TST: Change class-based auto-use setup to fixtures #21575

Closed

TomAugspurger commented Jun 21, 2018

View reviewed changes

TomAugspurger added 2 commits June 22, 2018 15:28

Remove auto-generated html fixtures

9954bba

linting

c771885

jreback reviewed Jun 26, 2018

View reviewed changes

Removed test test file

dd75270

jreback added the Needs Backport label Jun 26, 2018

jreback approved these changes Jun 26, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into package-size

dbe0c57

TomAugspurger merged commit 36422a8 into pandas-dev:master Jun 26, 2018

TomAugspurger deleted the package-size branch June 26, 2018 15:02

jorisvandenbossche removed the Needs Backport label Jul 2, 2018

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jul 2, 2018

PKG: Exclude data test files. (pandas-dev#19535)

14d65cd

(cherry picked from commit 36422a8)

jorisvandenbossche pushed a commit that referenced this pull request Jul 5, 2018

PKG: Exclude data test files. (#19535)

417e873

(cherry picked from commit 36422a8)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

PKG: Exclude data test files. (pandas-dev#19535)

fb37759

		@@ -65,9 +65,6 @@ def _skip_if_none_of(module_names):
		pytest.skip("Bad version of bs4: 4.2.0")


		DATA_PATH = tm.get_data_path()

		@@ -170,6 +170,8 @@ def test_read_non_existant(self, reader, module, error_class, fn_ext):
		])
		def test_read_fspath_all(self, reader, module, path):


		DATA_PATH = tm.get_data_path()

		def pytest_generate_tests(metafunc):

PKG: Exclude data test files. #19535

PKG: Exclude data test files. #19535

Conversation

TomAugspurger commented Feb 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Feb 4, 2018

TomAugspurger commented Feb 4, 2018 via email

Choose a reason for hiding this comment

codecov bot commented Feb 25, 2018 • edited Loading

Codecov Report

pep8speaks commented Feb 25, 2018 • edited Loading

Comment last updated on June 21, 2018 at 14:14 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jun 20, 2018 via email

jorisvandenbossche commented Jun 20, 2018

TomAugspurger commented Jun 20, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger Jun 21, 2018 • edited Loading

Choose a reason for hiding this comment

TomAugspurger Jun 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 26, 2018

TomAugspurger commented Jun 26, 2018

TomAugspurger commented Feb 4, 2018 •

edited

Loading

codecov bot commented Feb 25, 2018 •

edited

Loading

pep8speaks commented Feb 25, 2018 •

edited

Loading

TomAugspurger Jun 21, 2018 •

edited

Loading

TomAugspurger Jun 22, 2018 •

edited

Loading